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SUMMARY In 1973, Arimoto proved the strong converse 
theorem for the discrete memoryless channels stating that when 
transmission rate Ft is above channel capacity C, the error prob- 
ability of decoding goes to one as the block length n of code 
word tends to infinity. He proved the theorem by deriving the 
exponent function of error probability of correct decoding that 
is positive if and only if Ft > C. Subsequently, in 1979, Ducck 
and Korner determined the optimal exponent of correct decod- 
ing. Arimoto's bound has been said to be equal to the bound 
of Dueck and Korner. However its rigorous proof has not been 
presented so far. In this paper we give a rigorous proof of the 
equivalence of Arimoto's bound to that of Dueck and Korner. 
key words: Strong converse theorem, discrete memoryless chan- 
nels, exponent of correct decoding 

1. Introduction 



In some class of noisy channels the error probability of 
decoding goes to one as the block length n of transmit- 
ted codes tends to infinity at rates above the channel 
capacity. This is well known as a strong converse the- 
orem for noisy channels. In 1957, Wolfowitz [1] proved 
the strong converse theorem for discrete of memoryless 
channels (DMCs). His result is the first result on the 
strong converse theorem. 

In 1973, Arimoto [2] obtained some stronger result 
on the strong converse theorem for DMCs. He proved 
that the error probability of decoding goes to one ex- 
ponentially and derived a lower bound of the exponent 
function. To prove the above strong converse theorem 
he introduced an interesting bounding technique based 
on a symmetrical structure of the set of transmission 
codes. Using this bounding method and an analytical 
argument on convex functions developed by Gallager 
[3], he derived the lower bound. 

Subsequently, Dueck and Korner [4] determined 
the optimal exponent function for the error probabil- 
ity of decoding to go to one. They derived the result 
by using a combinatorial method base on the type of 
sequences. Their method is quite different from the 
method of Arimoto [2]. In their paper, Ducck and 
Korner [4] stated that their optimal bound can be 
proved to be equal to the lower bound of Arimoto [2] by 
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analytical computation. However, after their statement 
we have found no rigorous proof of the above equality 
so far in the literature. 

In this paper we give a rigorous proof of the equal- 
ity of the lower bound of Arimoto [2] to that of the 
optimal bound of Dueck and Korner [4] . To prove the 
above equality, we need to prove the convex property 
of the optimal exponent function. We prove this by 
an operational meaning of the optimal exponent func- 
tion. Contrary to their statement, our arguments of the 
proof are not completely analytical. A dual equivalence 
of two exponent functions was established by Csiszar 
and Korner [5] on the exponent functions for the er- 
ror probability of decoding to go to zero at rates below 
capacity. Their arguments of the proof of equivalence 
are completely analytical. We compare our arguments 
to their ones to clarify an essential difference between 
them. 

2. Coding Theorems for Discrete Memoryless 
Channels 

We consider the discrete memoryless channel with the 
input set X and the output set y. We assume that X 
and 3^ arc finite sets. Let X n be a random variable tak- 
ing values in X n . Suppose that X n has a probability 
distribution on X n denoted by Px™ = {Px n (x)} xeX „. 
Let Y n E y n be a random variable obtained as the 
channel output by connecting X n to the input of chan- 
nel. We write a conditional distribution of Y n on 
given X n as W n = {W n (y\x)} {x y)< - Xnxyn . A noisy 
channel is defined by a sequence of stochastic matrices 
{W n }^ =1 . In particular, a stationary discrete memo- 
ryless channel is defined by a stochastic matrix with 
input set X and output set y. We write this stochastic 
matrix as W ={W(y\x)}^ y)eX „ yyn . 

Information transmission using the above noisy 
channel is formulated as follows. Let M. n be a mes- 
sage set to be transmitted through the channel. Set 
M„ = \M. n \. For given W, a (n, M n , e„)-code is a set 
of {(x(m), T>(m), m E M n , } that satisfies the follow- 
ing: 

1) x(m) E X n , 

2) D(m),m E A4 n are disjoint subsets of y n , 

3) e " = A^ E W n {{V(m)f\x{m)), 
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where T>(m) 7 m 6 M n are decoding regions of the code 
and e„ is the error probability of decoding. 

A transmission rate R is achievable if there exists 
a sequence of (n, M n , e„)-codes, n = 1, 2, • ■ • such that 



lim sup e n — , lim inf — log M„ > R . 

n.->oo n-S-oo 71 



(1) 



Let the supremum of achievable transmission rate R be 
denoted by C(W), which we call the channel capacity. 
It is well known that C(W) is given by the following 
formula: 



C(W) = max UP, W) . 



(2) 



where V(X) is a set of probability distribution on X 
and I(P, W) stands for a mutual information between 
X and Y when input distribution of X is P. 

To examine an asymptotic behavior of e n for large 
n at i? < C(W), we define the following quantities. 
For give R > 0, the quantity is achievable error ex- 
ponent if there exits a sequence of (n, M n , e„)-codes, 
n = 1, 2, • • • such that 

lim inf — logM„ > i? , lim inf ( J loge„ > E . 

n->oo n n-s-oo y n J 

The supremum of the achievable error exponent E is 
denoted by E*(R\W). Several lower and upper bounds 
of E*(R\W) have been derived so far. An explicit form 
of E*(R\W) is known for large R below C{W). An 
explicit formula of E*(R\W) for all R below C(W) has 
been unknown yet. 

3. Strong Converse Theorems for Discrete 
Memoryless Channels 

Wolfowitz [1] first established the strong converse the- 
orem for DMCs by proving that when R > C(W), wc 
have lim I n. 00 e n = 1 . When strong converse theorem 
holds, we are interested in a rate of convergence for the 
error probability of decoding to tend to one as n —> oo 
fori? > C(W). To examine the above rate of conver- 
gence, we define the following quantity. For give R > 0, 
the quantity G is achievable exponent if there exits a 
sequence of (n, M n , £„)-codes,n = 1, 2, ■ • • such that 

lim inf — logM„ > R , lim sup ( J log(l — e n ) < G . 

n^oo n n-i-oo \ TlJ 

The infmum of the achievable exponent G is denoted 
by G*(R\W). This quantity has the following property. 

Property 1: The function G*(R\W) is a monotone 
increasing and convex function of R. 

Proof: By definition it is obvious that G*(R\W) 
is a monotone increasing function of R. To prove the 
convexity fix two positive rates R\ , R2 arbitrary. For 



each Ri 



1,2, we consider the infimum of the achicv- 



of G*(Ri\W),i = 1,2, for each i = 1,2, there exists 
a sequence of (n,M„ , el^)-codes, n = 1,2, such 
that 

lim inf- log M^>Ri, 
lim sup (--) log (l - e®) <G*(Ri\W) . 



Fix any = 1,2 with Ai + A2 = 1 and set n, = 
[A^nJ , where [a\ stands for the integer part of a. Set 
v = n — ni — ri2- It is obvious that G {0, 1, 2}. 

Next, we consider the code obtained by concate- 
nating (rii,Mn , En^-codes for i = 1,2. If v = 1 or 
2, we further append (1/, 1, 0)-code. For the above con- 
structed (n,M„, £ n )-code we have 

M n = n m« i-e n = n (i-^ 

i=l,2 t=l,2 

Then, we have 

lim inf — log M n 
= Y lim inf -1 ■ 1 logM« > V Aiili , 

i=l,2 i=l,2 
lim sup ( J log (1 - E n ) 

71— >OQ \ ^ / 



lim sup 



n— >oo 



71 \ 71 



1=1,2 

< ^ AiG*(i?i|W) . 

i=l,2 



log 1 - e 



(») 



Hence, wc have 



]T A l G*(i? i |M / ) >C ^ A.i?, 

i=l,2 \»=1,2 



IF 



which implies the convexity of G*(Ri\W). □ 
Arimoto [2] derived a lower bond of G*(R\W). 
To state his result we define some functions. For 
S E [—1, +00), define 

1+5 

A 



J S (P\W) = -log J2 

y&y 



y P{x)W(y\x)^Ts 



F 5 (R, P\W) = SR+ J S (P\W), 
Gg(R\W)= min F S (R,P\W). 

P€V(X) 

Furthermore, set 

G(R\W)= max G S (R\W) 

-1<5<0 

= max min Fg(R,P\W) 
-i<5<0Pev(x) 



able exponent function G*(Ri\W). By the definitions 



max 

-KKO 



-SR+ min J S (P\W) 
Pev(x) 
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According to Arimoto [2], the following property 
holds. 

Property 2: The function G(R\W) is a monotone in- 
creasing and convex function of R and is positive if and 
only if R > C(W). 

Arimoto [2] proved the following theorem. 

Theorem 1: For any R > 0, G*{R\W) > G(R\W) . 

Arimoto [2] derived the lower bound G(R\W) of 
G*(R\W) by an analytical method. Subsequently, 
Dueck and Korner [4] determined G*(R\W) by a com- 
binatorial method quite different from that of Arimoto. 
To state their result for P € V{X) and R > 0, we define 
the following function 



F+(R,P\W) = min { [5 (-R + I(P; V))} + 
vev(y\x) u 

+D(V\\W\P)} , 

where P{y\X) is a set of all noisy channels with input 
X and output y and [a] + = max{a,0}. Furthermore, 
for R > 0, define 

G+(R\W)= min F+(R,P\W), 
PeV(X) 

and for < R < log \ X\, define 

G S JR\W)= min min D(V\\W\P) . 
Pev(x) vev(y\x)-. 

I(P;V)>R 

The suffix "sp" of the function G sp (R\W) derives from 
that it has a form of the sphere packing exponent func- 
tion. Those functions satisfy the following. 

Property 3: 

a) The function G sp (R\W) is monotone increasing for 
< R < log | X | and takes positive value if and 
only if R> C(W). 

b) For < R < log \X\, we have 

GU{R\W) = G sp (R\W). 
Furthermore, for R > log \ X\, we have 
G\{R\W) = G-!(R\W). 

c) For R > 

\G\{R\W) - Gt^R'lW)] <\R- R'\ . 

Proof: Property 3 part a) is obvious. Proof of 
part c) is found in Dueck and Korner [4]. In this paper 
we prove the part b). To prove the first inequality, for 
fixed P e V{X), we set 



F^(R,P\W) 
= min {R-UP-V) + D(V\\W\P)} 

V£V(y\X): 
I(P;V)<R 

It is obvious that 
F+^R.PIW) 

= min {G sp (R, P\W), F_i (R, P\W)} 

G sp (R\W)= min G sp (R, P\ W). 
11 Pev(X) 1 



(4) 



Since -I(P;V) +D(V\\W\P) is a linear function of 
V, the minimum is attained by some V satisfying 
J(P; V) = R. Then, by (3), we have 

F+ 1 (R,P\W) = G sp (R,P\W). 

From the above equality and (4), we obtain the first 
equality. The second equality is obvious since R — 
I(P;V) > when R > log \X\. □ 
Dueck and Korner [4] proved the following. 

Theorem 2: For any R > 0, 

G± 1 (R\W) = G*(R\W). 

Although the lower bound derived by Arimoto [2] 
is a form quite different from the optimal exponent de- 
termined by Dueck and Korner [4] , the former coincides 
with the latter, i.e., the following theorem holds. 

Theorem 3: For any R > 0, 

G\{R\W) = G(R\W), 

or equivalent to 



max mm < — 5R 

-1<8<0 P£V{X) 



yey 



.xGX 



1+5' 



G sp (R,P\W) 



A 



min D(V\\W\P) 
vev(y\xy. 



= min min { \R - I(P: V)} + 
P ev(x)veny\x) +D(V\\W\P)} . 

The result of Theorem 3 is stated in Csiszar and 
Korner [5] without proof. Dueck and Korner [4] stated 
that the equivalence between their bound and that of 
Arimoto [2] can be proved by an analytical computa- 
tion. In the next section we give a rigorous proof of the 
above theorem. Contrary to their statement, our proof 
is not completely analytical. 

4. Proof of Theorem 3 

In this section we prove Theorem 3. The following is a 
key lemma for the proof. 
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Lemma 1: The function G~^ 1 (R\W) is a monotone 
increasing and convex function of R > 0. 

Proof: The results follows from the convexity of 
G*{R\W) and Theorem 2. □ 

Remark 1: We first tried to prove Lemma 1 by an 
analytical computation but could not succeed proving 
this lemma via this approach. According to [6] , for each 
fixed P G V{X), F±y{R,P\W) is a convex function of 
R > 0. However, this does not imply the contcxity of 
G\{R\W) with respect to R > 0. 

Next, for R > 0, we set 

F S (R, P\W) = min {6[I(P; V) - R] 

vev(y\x) 

+D(V\\W\P)} , 
Gs(R\W)= min F S (R,P\W). 

PdV{X) 

Then, we have the following two lemmas. 
Lemma 2: For any R > 0, 

G+(R\W)= max G S (R\W) . 

-1<6<0 

Lemma 3: For any R > 0, ~1 < 6 < and any 

P £V(X),wc have 

F S (R,P\W)>F S (R,P\W). 
Furthermore, for any R>0 and — 1 < 6 < 0, 
G S (R\W) = G S (R\W) . 

It is obvious that Theorem 3 immediately follows 
from Lemmas 2 and 3. Those two lemmas can be proved 
by analytical computations. In the following wc prove 
Lemma 2. The proof of Lemma 3 is omitted here. For 
the detail see Oohama [7]. 

Proof of Lemma 2: From its formula, it is obvious 

that 

G±AR\W)> max G 5 (R\W) . 

1 -1<<5<0 

In particular, from Property 3 part b), the equality 
holds for R > log \X\. Then, again by Property 3 part 
b), it suffices to prove that for < R < log l^j, there 
exists — 1 < S < such that 

G sp (R\W) = G S (R\W) . 

For — 1 < S < 0, we set 

K S (W) 



A 



max max \-5I(P; V) - D(V\ I W\P)\ 
P£V(x)vev(y\x) 



Then, by the definition of Gs(R\W), we have the fol- 
lowing. 

G 5 (R\W) = -6R-K S (W). 



Next, observe that by Property 3 part b) and Lemma 1, 
G sp (R\W) is a monotone increasing and convex func- 
tion of R. By this property and Property 3 part c), for 
any < R < log \ X\, there exists — 1 < S < such that 
for any < R' < log \ X\, we have 

G sp (R'\W) > G sp (R\W) - 5(R' - R) . 

Let (P, V) G V(X x y) be a joint distribution that 
attains G{R\W). For any (P',V) G V{X x y) set 
R' = I(P';V'). Then, we have the following chain of 
inequalities: 

SI(P';V')-D(V'\\W\P') 
< -5R' - G sp (R'\W) < -5R - G sp (R\W) 
<-6I(P;V) -D(V\\W\P). 
The above inequality implies that 

K S (W) = -SI{P;V) -D(V\\W\P) 
= -5R-G sp (R\W). 

This completes the proof. □ 

5. Comparison with the Proof of the Dual Re- 
sult 

Theorem 3 has some duality with a result stated in 
Csiszar and Korner [5]. To describe their result we 
define 



E S (R\W) 



max F S (R,P\W) 

P£V(X) 



E(R\W)=maxE s {R\W) 



= max max F S (R,P\W) 

5>0 P^V(X) 



= max 
<5>o 



-SR+ max J S (P\W) 
PeV(X) 



An explicit lower bound of E*(R\W) is first de- 
rived by Gallagcr [8]. He showed that the func- 
tion maxo<5<i Eg(R\W) serves as an lower bound of 
E*(R\W).~Next : we set 



Co (W0 



A 



max min I(P;V) 
pgv(x) vev(y\x) 



According to Shannon, Gallagcr and Berlekamp [9], 
Cq(W) has the following formula: 



Co (W) = — min max log 
P£P(x) yey 

For R > C (W), define 



x£X:W(y\x)>0 



E sp (R\W) 



max min D(V\\W\P) . 

P£V(X) V£V(y\X): 
I(P-V)<R 



According to Csiszar and Kdrncr [5], E sp (R\W) serves 
as an upper bound of E*(R\W) and matches it for large 
R below C(W). Csiszar and Kdrncr [5] obtained the 
following result. 
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Theorem 4 (Csiszar and Korner [5]): For any R > 
C (W ), 

E(R\W) = E sp (R\W) , 

or equivalent to 



max max 

S>0 PeV(X) 



-5R 



lo eE 



1+5" 



yey ixex 

= max min D(V\\W\P) . 
pev(x) vep(y\xy. 

I(P;V)<R 

In the following we outline the arguments of the 
proof of the above theorem and compare them with 
those of the proof of Theorem 3. 

By an analytical computation we have the follow- 
ing lemma. 

Lemma 4: The function E sp {R\ W) is a monotone de- 
creasing and convex function of R > Co ( W) and is pos- 
itive if and only if C (W) <R< C(W). 

Next, for R > 0, we define 

E 5 (R\W) = min F S (R,P\W). 

PeV(X) 

Then, we have the following two lemmas 
Lemma 5: For any R > Cq(W), 

E sp (R\W) = max E 5 (R\W) . 

Lemma 6: For any R > 0, 8 > and any P S V(X), 
we have 

F S (R,P\W)>F S (R,P\W). 
Furthermore, for any R > and <5 > 0, 

E 5 (R\W) =E S (R\W). 

It is obvious that Theorem 4 immediately follows 
from Lemmas 5 and 6. We prove Lemmas 5 and 6 in 
manners quite similar to those of the proofs of Lemmas 
2 and 3, respectively. We omit the details of the proofs. 

We compare the arguments of the proof of Theo- 
rem 3 with those of the proof of Theorem 4. An es- 
sential difference between them is in the proof of the 
convexity of exponent functions. Wc can prove the 
convexity of E sp (R\W) with an analytical method. On 
the other hand, the convexity Gl 1 (i?|VF) follows from 
G*{R\W) =Gtx{R\W) and the convexity of G*{R\W). 
The proof of the convexity of G*(R\W) is based on an 
operational meaning of the optimal exponent function 
of 1 — e„. We first tried an analytical proof of the con- 
vexity G^ 1 (i?|iy) but could not have succeeded in it. 
The difference of arguments is summarized in TABLE 
1. 



R > C(W) 


R < C(W) 


G*(R\W) = Gt^RlW) 
(Theorem 2) 


E*(R\W) < E sp (R\W) 
(Open Problem ) 


Operational Meaning 

a- 

Convexity of G* (R\W) 
(Property 1) 


Convexity of E*{R\W) ? 


Theorem 2 and Property 1 

Convcxity of Gt x {R\W) 
(Lemma 1) 

Lemma 2 


Analytical Computation 

Convexity of E sp (R\W) 
(Lemma 4) 

Lemma 5 


Lemmas 2 and 3 

G(R\W) = Gt^RlW) 
(Theorem 3) 


Lemmas 5 and 6 

E(R\W) = E sp (R\W) 
(Theorem 4 ) 



Table 1 Difference between the arguments of the proof of The- 
orem 3 and those of the proof of Theorem 4. 
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